Joeyonng
  • Notebook
  • Pages
  • About
  • Backyard
  1. Research Notes
  2. Confident Learning
  • Research Notes
    • Birkhoff+
    • CG Decision Rules
    • Confident Learning
    • L0 Regularization
    • ML Q & A
    • MLIC IMLI
    • Mobile Nets
    • Quantization Survey
    • RIPPER
    • SGD Warm Restarts
    • SSGD
    • Traversing Diagonals

On this page

  • Five Methods to identify instances with noisy labels
    • 1. CL baseline 1: C_{confusion}
    • 2. CL method 2: C_{\tilde{y}, y^{*}}
    • 3. CL method 3: Prune by Class (PBC)
    • 4. CL method 4: Prune by Noise Rate (PBNR)
    • 5. CL method 5: C + NR

Confident Learning

Published

July 8, 2022

This page contains my reading notes on

  • Confident Learning: Estimating Uncertainty in Dataset Labels

Notations:

  • All symbols with * are related to the unknown, true labels.

  • All symbols with \sim are related to the given, noisy labels.

  • All symbols with ^ are related to the estimates (the given model).

The procedure needs 2 inputs:

  • Out-of-sample predicted probabilities \hat{\mathbf{P}}: a matrix of n rows (# of training instances) and m columns (labels).

    • CL requires users to train a model on the training set using cross validation.

    • The model must be able to provide probability outputs to all possible labels.

  • The given labels \tilde{\mathbf{y}}: a vector of length n (# of training instances).

Five Methods to identify instances with noisy labels

1. CL baseline 1: C_{confusion}

The instance is considered to have the noisy label if its given label is different from the label with largest predicted probability.

2. CL method 2: C_{\tilde{y}, y^{*}}

In this method, a matrix called confident joint C_{\tilde{y}, y^{*}} will be calculated using \hat{\mathbf{P}} and \tilde{\mathbf{y}}.

C_{\tilde{y}, y^{*}} y^{*} = 0 y^{*} = 1 y^{*} = 2
\tilde{y} = 0 100 40 20
\tilde{y} = 1 56 60 0
\tilde{y} = 2 32 12 80

To calculate this matrix:

  1. For each label j, calculate the average predicted probability t_{j} using \hat{\mathbf{P}}.

  2. For each instance \mathbf{x}_{k} with the given label i in the training set, the entry at row i and column j of the confident joint matrix C_{\tilde{y}=i, y^{*}=j} will be added 1, where the true label j is the one that has the largest predicted probability among all the labels whose predicted probabilities are above the respected t_{j}.

    • This basically means that the true label for a given instance is the label whose predicted probability by a model is larger than the average predicted probability.

    • If there are more than one such labels, chose the one that has the largest predicted probability.

    • It is possible that no such label exists, and thus the instance won’t be counted in the matrix.

Thus, each entry in C_{\tilde{y}, y^{*}} is corresponding to a set of training instances.

All instances that fall in the off-diagonal of the C_{\tilde{y}, y^{*}} are considered to have noisy labels.

3. CL method 3: Prune by Class (PBC)

In this method and all methods below, another matrix called Estimate of joint \hat{Q}_{\tilde{y}, y^{*}} will be calculated using C_{\tilde{y}, y^{*}}.

\hat{Q}_{\tilde{y}, y^{*}} y^{*} = 0 y^{*} = 1 y^{*} = 2
\tilde{y} = 0 0.25 0.1 0.05
\tilde{y} = 1 0.14 0.15 0
\tilde{y} = 2 0.08 0.03 0.2

\hat{Q}_{\tilde{y}, y^{*}} basically is the normlized C_{\tilde{y}, y^{*}}: each entry in C_{\tilde{y}, y^{*}} is divided by the total number of training instances.

For each class i, the a number of instances with lowest predicted probabilities for label i are considered to have noisy labels, where a is calculated as the product of n and the sum of off-diagonal entries on row i of \hat{Q}_{\tilde{y}, y^{*}}.

4. CL method 4: Prune by Noise Rate (PBNR)

For each off-diagonal entry in \hat{Q}_{\tilde{y}, y^{*}}, the n \times \hat{Q}_{\tilde{y}=i, y^{*}=j} number of instances with largest margin are considered to have noisy labels, where the margin of an instance \mathbf{x}_{k} with respect to given label i and true label j is \hat{\mathbf{P}}_{k, j} - \hat{\mathbf{P}}_{k, i}.

5. CL method 5: C + NR

The instance is considered to have a noisy label if both PBC and PBNR consider it to have a noisy label.